Goto

Collaborating Authors

 traditional technique


Investigating Large Language Models' Linguistic Abilities for Text Preprocessing

arXiv.org Artificial Intelligence

Text preprocessing is a fundamental component of Natural Language Processing, involving techniques such as stopword removal, stemming, and lemmatization to prepare text as input for further processing and analysis. Despite the context-dependent nature of the above techniques, traditional methods usually ignore contextual information. In this paper, we investigate the idea of using Large Language Models (LLMs) to perform various preprocessing tasks, due to their ability to take context into account without requiring extensive language-specific annotated resources. Through a comprehensive evaluation on web-sourced data, we compare LLM-based preprocessing (specifically stopword removal, lemmatization and stemming) to traditional algorithms across multiple text classification tasks in six European languages. Our analysis indicates that LLMs are capable of replicating traditional stopword removal, lemmatization, and stemming methods with accuracies reaching 97%, 82%, and 74%, respectively. Additionally, we show that ML algorithms trained on texts preprocessed by LLMs achieve an improvement of up to 6% with respect to the $F_1$ measure compared to traditional techniques. Our code, prompts, and results are publicly available at https://github.com/GianCarloMilanese/llm_pipeline_wi-iat.


How Supercomputing Will Evolve, According to Jack Dongarra

WIRED

High-performance supercomputing--once the exclusive domain of scientific research--is now a strategic resource for training increasingly complex artificial intelligence models. This convergence of AI and HPC is redefining not only these technologies, but also the ways in which knowledge is produced, and takes a strategic position in the global landscape. To discuss how HPC is evolving, in July WIRED caught up with Jack Dongarra, a US computer scientist who has been a key contributor to the development of HPC software over the past four decades--so much so that in 2021 he earned the prestigious Turing Award. The meeting took place at the 74th Nobel Laureate Meeting in Lindau, Germany, which brought together dozens of Nobel laureates as well as more than 600 emerging scientists from around the world. This interview has been edited for length and clarity.


Try with Simpler -- An Evaluation of Improved Principal Component Analysis in Log-based Anomaly Detection

arXiv.org Artificial Intelligence

The rapid growth of deep learning (DL) has spurred interest in enhancing log-based anomaly detection. This approach aims to extract meaning from log events (log message templates) and develop advanced DL models for anomaly detection. However, these DL methods face challenges like heavy reliance on training data, labels, and computational resources due to model complexity. In contrast, traditional machine learning and data mining techniques are less data-dependent and more efficient but less effective than DL. To make log-based anomaly detection more practical, the goal is to enhance traditional techniques to match DL's effectiveness. Previous research in a different domain (linking questions on Stack Overflow) suggests that optimized traditional techniques can rival state-of-the-art DL methods. Drawing inspiration from this concept, we conducted an empirical study. We optimized the unsupervised PCA (Principal Component Analysis), a traditional technique, by incorporating lightweight semantic-based log representation. This addresses the issue of unseen log events in training data, enhancing log representation. Our study compared seven log-based anomaly detection methods, including four DL-based, two traditional, and the optimized PCA technique, using public and industrial datasets. Results indicate that the optimized unsupervised PCA technique achieves similar effectiveness to advanced supervised/semi-supervised DL methods while being more stable with limited training data and resource-efficient. This demonstrates the adaptability and strength of traditional techniques through small yet impactful adaptations.


Deep Learning Vs Traditional Computer Vision Techniques: Which Should You Choose?

#artificialintelligence

Deep Learning(DL) is undeniably one of the most popular tools used in the field of Computer Vision(CV). It's popular enough to be deemed as the current de facto standard for training models to be later deployed in CV applications. But is DL the only available option for us to develop CV applications? What about Traditional techniques that have served the CV community for an eternity? Has the time to move ahead & drop working on Traditional CV techniques all together in favor of DL arrived already?


Tensorflow Tutorial Uses Python

#artificialintelligence

Around the Hackaday secret bunker, we've been talking quite a bit about machine learning and neural networks. There's been a lot of renewed interest in the topic recently because of the success of TensorFlow. If you are adept at Python and remember your high school algebra, you might enjoy [Oliver Holloway's] tutorial on getting started with Tensorflow in Python. Then he shows some basic setup operations. From there, he has the software "learn" how to classify random points that either fall into a circle or don't.


Machine learning and what it means for marketing

#artificialintelligence

Machine learning has a high profile currently and is riding a wave of exposure in the media that includes articles about subjects from self-driving cars and self-landing rockets, to computers beating the world's best players at Go, the most computationally complex board game in the world. Is there an opportunity for your organisation, and the marketers within it, to make use of this "new" technology? Machine learning techniques were developed as long ago as the 1950s, but with the advent of big data and large analytical engines, the prevalence and the ease of applying the techniques has increased. Additionally, organisations now understand the value that analytics can bring, so are willing to place it front and center in their plans and invest more time and resources in exploring new and better techniques. Segmentation and predictive models, for instance, have proven themselves time and again in the marketing world, but to a certain extent, they require a higher degree of knowledge to understand. In some cases, a machine learning technique unburdens the user of the statistical work, but provides just as good an answer as a traditional technique.